Welcome to the course Introduction to Data Science with Python¶

Goals¶

  • Recognize the uses of the Python language.
  • Understand the basic concepts and functionalities of Python.
  • Code development for the design of solutions.

Index¶

Index ▲

  1. Hello Python
  2. Functions
  3. Booleans and conditionals
  4. Lists
  5. Cycles
  6. Dictionaries
  7. Working with libraries
    1. Library Numpy
    2. Library Matplotlib
    3. Library Plotly
    4. Library Scikit-Learn

Hello Python 🚀¶

Index ▲

This introductory course covers the basic aspects of the Python language, which you will need to get started in the area of Data Science. This course is dedicated mainly for those who already have experience with code or even another language such as R, Julia, Matlab, among others.

We will first start with Python syntax, as well as variable assignment and arithmetic operations.

Generally in a data science project, we need to present results of models, graphs and/or analysis. In this way, a first message that we could develop would be that our project was successful, for this we must define the variable where we will save the message "The project was successful."

In [1]:
message="The project was successful."
print(message)
The project was successful.

As you can see, exposing results in Python is not as difficult as we might think.

Just as in the previous code block, in Python you can work with different types of variables: Numeric, String, among other subtypes. But it should be noted that when we talk about numerical variables, with them we can perform all types of arithmetic operations and with the String type variables we can generate texts that show the output of a successful process or even generate documentation.

Below we can see an example of a numeric variable that stores the value 33 and a string variable that stores the text "This is my age"

In [1]:
number=33
phrase="This is my age"

However, we could work with both variables together and display it as an output, as shown below with the variable print()

In [2]:
print(phrase,number)
This is my age 33

Ya hemos visto dos tipos de variables,sin embargo si tenemos dudas acerca de la definción podemos verificar el tipo de variable con que estamos trabajando.

In [3]:
type(number)
Out[3]:
int
In [4]:
type(phrase)
Out[4]:
str

As we can see in the output of the block, the variable that stores the number value is of type "int", this is an abbreviation for "integer", instead for the phrase variable the type is "str" referring to "string" or text string. However, we could also define a variable that stores a numerical value as text, in this case we could not perform arithmetic operations with it.

In [7]:
other_number="28.2"
type(other_number)

print("The 'other_number' variable,stores the value",other_number)
The 'other_number' variable,stores the value 28.2

However, if we wanted to work with numerical variables with decimal values such as "30.2" "23.4" or others, we can define variable of type float.

In [8]:
number_float=30.2

print("The number_float variable,have the value",number_float,"and his type is",type(number_float))
The number_float variable,have the value 30.2 and his type is <class 'float'>

In this way, if we wanted to perform arithmetic operations with variables of type integer or float, we can perform the operations listed in the following table.

Syntax Operation Description
x + y Addition The sum of x and y
x - y Subtraction The difference between x and y y
x * y Multiplication The product between x and y
x / y Division The quotient between x e y.
x % y Module The remaining integer value, after dividing x and y.
x ** y Empowerment The result of raising x to y.
-x Denial The negative value of x.

According to the above,we can perform some basic operations such as addition,subtraction,division and multiplication...by previously defining the variables x and y.

In [10]:
x=10
y=2
In [11]:
print(x+2)
12
In [12]:
print(x-2)
8
In [13]:
print("The division the between y is",x/y)
The division the between y is 5.0
In [14]:
print("The multiplication between the variables x and y is",x*y)
The multiplication between the variables x and y is 20

But,also we can make some calculus a bit more complex,like determine volumes of cube,cones or others figures.

In [15]:
face=5
print("A cube of face",face,"have a volumne of:",face**3,"cm^3")
A cube of face 5 have a volumne of: 125 cm^3
In [18]:
height=2
wide=4
length=7

print("The volume of a parallelepiped,with height of",height,"wide",wide,"and length",length,"is of: ",height*wide*length,"cm^3")
The volume of a parallelepiped,with height of 2 wide 4 and length 7 is of:  56 cm^3

Functions 🎸¶

Index ▲

Just as in the previous section we saw how to define, manipulate some types of variables and even define some formulas, in other cases we will need to define more complex procedures and/or that require several steps for their execution and for this we will use functions, which we can use whenever we want.

As a first function we could define a simple function that prints some values on the screen

In [1]:
def function_balance(name):
    "This is the help of function"
    
    balance=100
    return print("Good Morning: ",name,"the balance in your account is:",balance)

function_balance(name="juan")
Good Morning:  juan the balance in your account is: 100

Every function in Python starts with the reserved word def, then the name which in this case is function_balance and within parentheses the variables that the function accepts. Once the variable accepts the variable name, A balance of 100 is generated and saved in the balance variable. Finally, the function returns a message to the user.

However, if we want to know a little more about the function we can call its help with the help() function

In [4]:
help(function_balance)
Help on function function_balance in module __main__:

function_balance(name)
    This is the help of function

In this case, the help information of the function is provided. However, we can add more detailed information to the description of the function using the triple quotes """ within it.

In [2]:
def function_balance(name):
    """This is the help of the function_balance.This function
    print of message with the input name and the default balance
    is 100
    """
    
    balance=100
    return print("Good morning: ",name,"the balance in your account:",balance)

help(function_balance)
Help on function function_balance in module __main__:

function_balance(name)
    This is the help of the function_balance.This function
    print of message with the input name and the default balance
    is 100

In this case, the help information of the function is provided. However, we can add more detailed information to the description of the function using the triple quotes """ within it.Unlike the previous code, now it is explicitly shown what the function does, which was added in the new definition of the function "function_balance".

But we can also define functions that perform some more extensive operations, as well as mathematical operations that take different values depending on the variables defined for the function.

In this case the difference function determines the absolute differences that exist between 3 variables a, b and c. Thus, the differences are determined inside the function and then they are returned along with the text that shows which pairs of variables the differences are determined. It should be noted that the help was also written for this function.

In [3]:
def difference(a,b,c):
    """ This function return the difference between three values that are delivered"""
    
    dif1=abs(a-b)
    dif2=abs(a-c)
    dif3=abs(b-c)
    return print("The difference between a and b is",dif1,"The difference between a and c is",dif2,"and the difference between b and c is",dif3)
    

help(difference)

difference(10,20,30)
Help on function difference in module __main__:

difference(a, b, c)
    This function return the difference between three values that are delivered

The difference between a and b is 10 The difference between a and c is 20 and the difference between b and c is 10

But things can still be complicated a little more...👀 In many cases we will use functions for specific tasks, but in others we might need to apply another function to a function immediately, that way the output of the first one will be applied a new function. If we create a first function multi_1(x), which multiplies the input value by 2, we could modify the output with a new function multi_2(fn)

In [2]:
def multi_1(x=2):
    """This first function multiply for 2 the input variable,by default the input value is 2"""
    
    y=x*2
    return y

def multi_2(fn):
    """ This function take the output of multi_1 and multiplies for 2 """

    z=fn*3
    return z


multi_2(multi_1(10))
Out[2]:
60

In this way we can call a function within another, when we give multi_1() the value of 10, it multiplies it by 2 and returns 20. So then this value 20 is taken by the function multi_2 () and multiplies it by 3, thus obtaining the value 60 that is shown on the screen.

Booleans and conditionals 🕹¶

Index ▲

We already saw that in Python there are different types of variables such as "integer" and "string", but we will not always work with numbers or text. We can use the variables "bool", which have 2 possible values "True " or "False".

We can obtain these variables as the output of other functions, but we can also define and validate this type of variable whenever necessary.

In [70]:
x=True
print(x)
print(type(x))
True
<class 'bool'>

In this case we can see that the value of x is "True" and when validating the type of variables which is x, it returns on the screen that it is "bool" or *"boolean" *.

We can also define other variables and check if their values are the same or different.

In [71]:
y=False

x==y
Out[71]:
False

In this case, when defining a new variable y as "False", we can see that when checking with the "==" sign if they are equal, the value "False" is returned, due to to which x is assigned the value True instead and is assigned the value False.

However, we can use various operators between Boolean variables as appropriate.

Operation Description
x == yx is equal to y
x < yx is minor than y
x <= yx is minor or equal than y
x != yx is not equal to y
x > yx is greater than y
x >= yx is greater or equal to y

By this way,we can integrate this type of variables and the operations in some any functions.

In [1]:
def cashier(card):
    """ This function allows you validate,if the card have balance or not """
    
    balance=100
    if card==True:
        return print("Your balance is",balance)
    else: 
        return print("Not possible validate your balance in this moment.")
              

cashier(card=True)
Your balance is 100

And this,allows you validate if the card variable have the TRUE value in the cashier.The cashier function return the message Your balance is 100,and in other case is not possible validate the balance.

Functions and conditions¶

As we saw in the previous code we can verify the value assigned to a numeric, boolean or string variable, using an operator and a condition. This is because if a variable meets a certain condition, a block of code is executed, otherwise another block of code is executed.

Although we can also verify if more than one condition is met, as in the following inspector function.

In [3]:
def checker(x):
    if x==0:
        print(x,"Is zero")
    elif x>0:
        print(x,"Is positive")
    else: 
        print(x,"Is negative")

checker(0)
checker(-10)
0 Is zero
-10 Is negative

In this case, the function, once it is given a numerical value, is able to determine if it is equal to 0, greater than 0 or even if it is less than 0.

This is done through the if condition, where this if condition verifies that the condition x==0 is met, if not met, the verification that it is greater than 0 is used with elif* * and if none of the previous two are met, the last block of code corresponding to print(x,"It is negative")** is executed.

Some last considerations for Boolean variables. We can convert almost any type of data to a Boolean variable, in this way every positive value or non-empty text will be considered in Python as a True value and the value 0 or empty text as False, as in the next case.

In [5]:
print(bool(12))
print(bool("Hi"))
print(bool(0))
print(bool(""))
True
True
False
False

Lists 🏗¶

Index ▲

Lists in Python are a data structure similar to what we do with the variables we already know, but they allow us to perform more complex operations.

This way in a list we can store a sequence of values, like the lists generated for the primes and days variables below.

In [6]:
primes=[2,3,5,7]

days=["Monday","Tuesday","Wednesday","Thursday","Friday"]

As you can see, both are lists, however in the first we store only numerical values and in the second we store strings.

But we can take it a little further and create a list of lists 👀

In [2]:
cards=[["j","q","k"],[2,3,5],[5,"A","K"]]

In this case we generate a list composed of 3 lists, however each of these lists is different. The first only contains string values, the second only numeric values and the third contains both numeric values as string.

Since we have a list that has a more extensive structure, we might want to select some of its values and not always deal with the entire cards list. So we could select

In [3]:
print(cards[1])

print(cards[2])
[2, 3, 5]
[5, 'A', 'K']

So we can select the second list of cards or even select the third list of cards. But we could even select one of the elements of these lists as in the next block.

In [4]:
print(cards[1][0])

print(cards[2][2])
2
K

There you can see how we can select the first element of the second list obtaining the value of 2 or select the third element of the third list obtaining the string K.

But if once we define the card list, we want to add a new hand... sorry a new list, we can do it with the following assignment.

In [79]:
cards[0]=["A","J","Q"]

cards
Out[79]:
[['A', 'J', 'Q'], [2, 3, 5], [5, 'A', 'K']]

Now as expected, we can also use some functions on these lists. In this case we might want to determine the number of days that the days variable has 😅... or even sort the days alphabetically... or even mathematical operations like the sum of the prime numbers or determine the maximum of them.

In [7]:
# How many days are there in a week
print(len(days))

# Sort the days alphabetically
print(sorted(days))

# sum of prime numbers
print(sum(primes))

#maximum  of prime numbers
print(max(primes))
5
['Friday', 'Monday', 'Thursday', 'Tuesday', 'Wednesday']
17
7

Methods¶

Unlike the functions that are already created in Python or those that we can define. There are also methods that are already created in Python Built-in, which however depend on the type of variable we are working with.

Some of the methods developed in Python for different types of variables:

MethodDescription
capitalize()Convert in capitalize the first character in the string.
index()Search the position of a character and return.
split()Separate a string and convert in a list.
upper()Convert in capitalize all character of a string.
lower()Convert all character of a string in lower.
In [1]:
text="This a test text"
text.capitalize()
Out[1]:
'This a test text'
In [2]:
text.index("text")
Out[2]:
12
In [83]:
text.split()
Out[83]:
['este', 'es', 'un', 'texto', 'de', 'prueba']
In [3]:
text.upper()
Out[3]:
'THIS A TEST TEXT'
In [4]:
text.lower()
Out[4]:
'this a test text'

But we could also use methods on numerical variables, lists and others. However, the main difference between a method and a function is that the methods will always be part of a class, otherwise they are part of the functionalities that we can give to an object .

Perhaps so far we have not talked about objects, however objects are one of the main features of Python, since this is a software that supports Object Oriented Programming (OOP), in a later course.

Tuples¶

Tuples are similar to lists, however they have some differences with lists. At first we can define them in a similar way to the one generated and saved in the variable t

In [86]:
t=(1,2,3)
t
Out[86]:
(1, 2, 3)

In this case we only change the type of parentheses to create the tuple t, however if we wanted to change any of the values of the tuple as in the previous cases we cannot, below is a forced error... 👀

In [87]:
t[0]=1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [87], line 1
----> 1 t[0]=1

TypeError: 'tuple' object does not support item assignment

But as we already saw, we can also use methods on tuples. In this case, if we have a float type variable, we can use the as_integer_ratio() method to return a tuple composed of the *numerator and denominator * from the float variable.

In [1]:
x=0.125

numerator,denominator=x.as_integer_ratio()

print(numerator/denominator)
0.125

Cycles 🎰¶

Index ▲

Life has many cycles... well not exactly like the ones we are going to see now, but they certainly have a lot in common with routines. As we have seen we can define lists of text strings, in this case we can define a list with the days of the week.

Loop for¶

In [ ]:
days=["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]

And we could show each day on the screen with a for loop

In [ ]:
for i in days:
    print(i)

As you can see, the for loop iterates the variable i over the list days, taking each of the values and printing the day on the screen.

On the other hand, if we took a numerical list and iterated it in the same way, it would only print the numerical values of this list.

In [1]:
numbers=[1,2,3,4,5]

for j in numbers:
    print(j)    
1
2
3
4
5

So we can see that in this case we change the list and then iterate with a new variable, in this case j.

Furthermore, we can not only iterate the values in a list, we could also perform some operation within the loop, in this case we define a new list and then the product variable.

In [3]:
multiplicative=(2,2,2,3,3,5)
product=1

for i in multiplicative:
     product=product*i

product
Out[3]:
360

In this case the for loop iterates the multiplicative list values and then in each iteration it multiplies the value of the product variable by the value of multiplicatives and stores it.

In [4]:
for i in range(5):
     print("This i is the number: ",i)
This i is the number:  0
This i is the number:  1
This i is the number:  2
This i is the number:  3
This i is the number:  4

Returning to the first cycle, we could also use the range() function, to optimize the iteration of the variable i. Thus it is not necessary to define the list to iterate and we only assign the value up to which we want to iterate to the range() function.

While Loop¶

In addition to the for loop that we saw in the previous section, we may also want to iterate a list or a process while a certain condition is met. For this we use the while loop, which unlike the previous case where we iterate a variable, For the while loop we expect that as long as a condition is met, the instructions within it will be executed.

It should be noted that for the while loop, special consideration must be taken, since if the condition we establish is never met, the while loop is executed infinitely, potentially causing the software or the PC to consume all the memory and having to restart the process or even the PC.

Next, the value of the variable i=0 is defined, then the loop begins by validating this condition, as it is true it then prints the value of the variable and then i=i+1. Finally this operation It is repeated until i takes the value 9, since when it takes the value 10, the condition i<10 is no longer true.

In [1]:
i=0

while i<10:
    print(i,end=" ")
    i=i+1
0 1 2 3 4 5 6 7 8 9 

On the other hand, if we execute the next block, the cycle is executed infinitely since the condition will always be true, causing the software to collapse and close... 🔥

In [ ]:
i=True

while i==True:
    print(i)
    

Another important aspect in Python is that when we define a list it is possible to use loops within them. In some cases we only require some values within a list, but if we seek to create a sequence that has a greater length, such as perhaps a list With the values from 1 to 20, a simple way to write it would be as follows:

In [2]:
serie_1=[n for n in range(1,21)]
serie_1
Out[2]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Remember that in Python the upper limit is specified by the following value, that is why within the range() function we use 21 and not 20.

But we will not always need to create the same lists, in some cases we could need the list to start or end with different values. For this we could make use of the functions... which are our best ally when we want to encapsulate a process that we carry out with great care. frequency 📲

In [6]:
def lists(start=1,end=5):
    return [n for n in range(start,end)]

l1=listas()
l1
Out[6]:
[1, 2, 3, 4]
In [7]:
l2=lists(start=10,end=21)
l2
Out[7]:
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

As seen in the first block, once we define the lists function and call it to create our first list l1, the list is created with the default values. However, when we call it to create the list l2 with different limits, the list is immediately created with the values that start at 10 and end at 20.📌

Dictionaries 📗¶

Index ▲

Up to this point in the course, we have already seen most of the important aspects to at least start working with Python, however one of the points where Python is excellent is with the manipulation of strings or text strings.

Although we have already defined and performed some operations with strings, it is always good to review before moving on to the most interesting part of this section...the dictionaries. 📚

We had seen that some text strings could be defined and stored in variables like the ones shown below

In [ ]:
x="monday is a day"
y="Monday is a day"

As we can see here, in both variables the same text is apparently saved, but the first letter is not in capital letters in both variables, therefore when validating if they are equal a False will be returned.

In [ ]:
x==y
In [ ]:
On the other hand, we define text strings in exactly the same way.
In [ ]:
x="Monday is a day"
y="Monday is a day"

x==y

Now we can verify that both text strings are the same.

However, we might also want to skip a line in some cases, as in the following.

In [2]:
a=print("Mond\nay is a day")
a
Mond
ay is a day

In this case we see that by adding the "\n", immediately everything that comes after is written on the next line... and of course we could use it again if we wanted to write each word on a new line

In [7]:
a=print("Monday\nis\na\nday")
a
Monday
is
a
day

Since we have already worked with the function, we know that it automatically adds a new line unless we specify a term value.

In [8]:
print("Tuesday is another day of the week")
print("Wednesday is another day of the week",end='')
print("Thursday is another day of the week",end='')
Tuesday is another day of the week
Wednesday is another day of the weekThursday is another day of the week

As you can see in the previous case, if we use the usual print, the next text that is printed goes to the next line. However, when we make the second print we add the argument end= '', the which in this case allows the text to continue on the same line once printed, unless we use the \n value that comes by default.

In [9]:
print("Friday is another day of the week",end='\n')
print("No work on Saturday")
Friday is another day of the week
No work on Saturday

But text strings can not only be a set of words, but we can separate them into characters.

If we consider the last text string:

In [ ]:
Saturday="No work on Saturday"
saturday[3]

As we can see, it is possible to store a text string in a variable and then select each of the characters in it, and little by little concatenate the characters of the entire text string.

In [20]:
saturday="No work on Saturday"
saturday[0]+saturday[1]+saturday[2]+saturday[3:19]
Out[20]:
'No work on Saturday'

We could even determine the number of letters in the phrase "There is no work on Saturday"

In [23]:
print('The length of variable saturday is of',len(saturday),'character')
The length of variable saturday is of 19 character

Dictionaries¶

So far we have worked with characters, text strings or strings and numerical variables. But one of the most important aspects in Python are dictionaries, which are data structures that allow mapping keys to values, this implies that we can index a set of values to another set of keys or braces.

In [3]:
numbers={"one":1,"two":2,"three":3}
numbers
numbers["three"]
Out[3]:
3

In the case of the number dictionary, "one", "two" and "three" are the keys and the values 1,2 and 3 are the values.

In this case we can access each of the values using the keys.

In [5]:
numbers["one"]
Out[5]:
1

So we can select the first value, but we could also add other values using a new key.

In [6]:
numbers["four"]=4
numbers
Out[6]:
{'one': 1, 'two': 2, 'three': 3, 'four': 4}

Or even changing just some of those values already associated with a key.

In [7]:
numbers["one"]="Monday"
numbers
Out[7]:
{'one': 'Monday', 'two': 2, 'three': 3, 'four': 4}

So finally we could even iterate a dictionary using a for loop.

In [8]:
for i in numbers:
    print(numbers[i])
Monday
2
3
4

Or even validate if a key is found in the dictionary.

In [9]:
"one" in numbers
Out[9]:
True
In [10]:
"five" in numbers
Out[10]:
False
In [11]:
numbers
Out[11]:
{'one': 'Monday', 'two': 2, 'three': 3, 'four': 4}

Some other operations that we can do in Python is to rename the values of a dictionary by iterating the values of the keys of the same dictionary.

In [ ]:
days=["Monday","Tuesday","Wednesday","Thursday","Friday"]

days_json={day:day[0] for day in days}
days_json

In this case, a dictionary was created where its keys are the days of the week and the values correspond to each of the first characters of the keys.

But even the value could be a shorter version of the key name or a short description if applicable. In this case, by changing the number of characters to select from day[0] to day[0 :3], we go from only considering the first letter to the first 3.

In [12]:
days=["Monday","Tuesday","Wednesday","Thursday","Friday"]

days_json={day:day[0:3] for day in days}
days_json
Out[12]:
{'Monday': 'Mon',
 'Tuesday': 'Tue',
 'Wednesday': 'Wed',
 'Thursday': 'Thu',
 'Friday': 'Fri'}

Finalmente, si quisieramos podriamos utilizar métodos sobre los diccionarios que hemos creado para realizar algunas tareas sencillas.

In [14]:
days_json.values()
Out[14]:
dict_values(['Mon', 'Tue', 'Wed', 'Thu', 'Fri'])

En este caso se utilIn this case, the .values() method was used to display all the values of the dictionary on the screen.izó el método .values() para mostrar por pantalla todos los valores del diccionario.

Working with libraries 🎮¶

Index ▲

There are several topics that we have touched on in this course from the declaration of numerical, string and boolean variables and even some data structures. But we still have to see how we can work with libraries within the Python environment which are very important if we want to extend the capabilities of Python, although we will not see all the libraries that can be used in Python since there are many, which range from the definition of numerical variables, development of graphs, connection to databases, development of user interfaces and even the use of machine learning models, which is enough content and material for a more advanced course. However, we will see 3 libraries widely used in both industry and academia.

The first is Numpy which is a library specialized in numerical calculation and data analysis, one of the most important aspects is that you can define a data structure called arrays which are processed much more faster than the lists that Python handles by default 🙄. The second is Matplotlib which is a standard library to start developing visualizations in Python, the third is Plotly another visualization library, which I personally like for its syntax, simplicity and because it is also implemented for Software R and finally we will see Scikit-Learn, which is one of the most important and used libraries for the implementation of machine learning models.

Library Numpy¶

Index ▲

As we had already said, one of the strong points of the Numpy library are the array objects, which are made up of elements of the same type and which can be 1,2 or n dimensions. Thus we could define a * list* or 1-dimensional array, a matrix or 2-dimensional array or also a cube which would be a 3-dimensional array.

However, first we should load the numpy library

In [ ]:
import numpy

Once loaded we can use all the functions that come within this library. It should be noted that we can load the library with its default name or with an abbreviated form, which will help us every time we call the functions within it.

In [2]:
import numpy as np

Unlike the lists that we can define regularly in Python, to define a 1-dimensional list or array we have to

In [3]:
a1=np.array([10,20,30,40])
print(a1)
[10 20 30 40]

In the same way, if we wanted to define a 2-dimensional array

In [4]:
a2=np.array([[10,20,30,40],[50,60,70,80]])
print(a2)
[[10 20 30 40]
 [50 60 70 80]]

And continuing with the idea, then a 3-dimensional array can be defined as follows

In [5]:
a3=np.array([[[10,20,30],[40,50,60]],[[70,80,90],[110,120,130]]])
print(a3)
[[[ 10  20  30]
  [ 40  50  60]]

 [[ 70  80  90]
  [110 120 130]]]

Therefore, if we want to access the elements of any type of array, we must use the indexes just as we access the elements of a list, but selecting the indexes of each dimension.

In [ ]:
#To access the element of row 0 column 0 of the 2-dimensional array
print(a2[0,0])

#To access the element of row 1 column 2 of the 2-dimensional array
print(a2[1,2])

But we could also use some methods to obtain some important characteristics of an array, such as those shown in the following table.

AttributeMethodDetail
Dimension Numbera.ndim()Returns the number of dimensions of the array a.
Dimensionsa.shape()Returns a tuple with the dimensions of the array a.
Sizea.size()Returns the number of elements in array a.
Data typea.dtype()Returns the data type of the elements of the array a.
In [6]:
# Number of dimensions for each of the arraysprint(np.ndim(a1))
print(np.ndim(a2))
print(np.ndim(a3))
1
2
3
In [ ]:
# Dimensions for each of the arrays
print(np.shape(a1))
print(np.shape(a2))
print(np.shape(a3))

As you can see in the first case with the a.ndim() method, only the number of dimensions is shown to us, but with the a.shape() method the explicit dimensions are shown.

But we can even perform operations with arrays even if they have different dimensions and values in these dimensions, since they will operate only on those available dimensions.

In [ ]:
print(a1*2)
In [ ]:
print(2*a2/a2)

In the same way we could perform algebraic operations with vectors and matrices

In [ ]:
import numpy as np
b1=np.array([1,2,3])
b2=np.array([1,0,1])

#b1.dot(b2) determines the dot product between the vectors b1 and b2
print(b1.dot(b2))

So we could also transpose a matrix using a.T.

In [ ]:
import numpy as np
a=np.array([[1,2,3],[4,5,6]])

print(a)
print(a.T)

We can even solve systems of linear equations using solve(a,b).

In [7]:
import numpy as np

# The following is a system of 2 linear equations with 2 variables
#2x+3y=10
#4x+5y=8

a=np.array([[2,3],[4,5]])
b=np.array([10,8])
print(np.linalg.solve(a,b))
[-13.  12.]

Some other functions to work with linear algebra are the following:

MethodOperation Description
dot(b)Scalar productDetermines the scalar product between vectors a and b.
norm(a)Module of a vectorDetermines the module of the vector v.
a.dot(b)Product of 2 matricesDetermines the matrix product of matrices a and b.
a.TTransposed matrixDetermines the transposed matrix of the matrix a.
a.trace()Trace of a matrixDetermines the main diagonal sum of the square matrix a.
det(a)Determinant of a matrixReturns the determinant of the matrix a.
inv(a)Inverse MatrixDetermines the inverse matrix of the square matrix a.
eigvals(a)Eigenvalues of a matrixDetermines the eigenvalues of the square matrix a.
eig(a)Eigenvectors of a matrixDetermines the eigenvectors of the square matrix a.
solve(a,b)Solution of a system of equationsDetermines the solution of a system of linear equations.

Library Matplotlib¶

Index ▲

The time has come...🥁 so far we have seen the definition of variables of different types as well as data structures, among other relevant aspects in Python, we have even loaded Numpy... our first library.

But now it is the turn to use the Matplotlib library, which is developed for the creation of graphics. It should be noted that it is not the only library for graphics development.

Some of the graphs that can be developed with Matplotlib and that we will see are the following:

  • Scatter plots
  • Line diagrams
  • Box plots
  • Bar charts
  • Histogram

One of the most used graphics are scatter diagrams, since they quickly allow us to visualize how the data is distributed in space. In some cases we will have to use other graphics to better represent the information, but it is a good starting point. . 💡

In [1]:
#We bring the pyplot module in abbreviated form as plt
import matplotlib.pyplot as plt

#The figure and axes are created
fig,ax=plt.subplots()

#The points of both the x and y axis are defined where diagram
ax.scatter(x=[1,1.5,2,2.5,3],y=[1,1.5,2,1.5,1])

#Optionally we can define the size of the figure
fig.set_size_inches(4,4)

#Finally we show the graph
plt.show()

However, to make some graphics like those already mentioned, we can use functions developed specifically for that task, shown below:

Scatterplots¶

To develop a scatter diagram, which allows us to visualize the points with coordinates on the x and y axis, we must use the function scatter(x,y) as in the previous case that we saw, but without a doubt we could represent a larger data set.

In [2]:
import matplotlib.pyplot as plt

fig,ax=plt.subplots()
ax.scatter([1,3,0.5,5,0.4,1.9,2.6],[0.9,3,5,2.5,7.6,4.3,7])

#Remember that defining the size of the figure is optional
fig.set_size_inches(4,4)

plt.show()

Line diagrams¶

With the line diagram we seek to represent again a set of points that have coordinates in x and y, however the distances between each pair of successive points are also connected with the function plot(x,y)

In [3]:
import matplotlib.pyplot as plt

fig,ax=plt.subplots()
ax.plot([1.5,2,3,4],[1,0.75,1.5,0.5])

fig.set_size_inches(4,4)

plt.show()

Boxplots¶

Unlike the previous graphs, in this case the box plot allows you to visualize some relevant statistics of a sample. In this case we must use the boxplot(x) function

In [4]:
import matplotlib.pyplot as plt

fig,ax=plt.subplots()
ax.boxplot([2.3,4.5,1,8,10,4.5,5.6,7.6,3.4,2.4,20])

fig.set_size_inches(4,4)

plt.show()

Bar charts¶

The bar graph generally allows us to visualize 2 axes, where on the y-axis we position the categories and on the x-axis the magnitude counted for each category.

So to develop a bar chart in Python we use the syntax barh(x,y)

In [5]:
import matplotlib.pyplot as plt

fig,ax=plt.subplots()
ax.barh([1,2,3],[3,2,1])

fig.set_size_inches(4,4)

plt.show()

Histogram¶

Finally, one of the most used graphs in descriptive and inferential statistics, along with box graphs. The histogram allows us to visualize the frequency distribution once we group the data into a sample. It should be noted that we can define the number of columns we want it to have. our diagram.

In this case we will also use the numpy library to generate 1000 random numbers using a normal distribution with $\bar{X}=10$ and $\sigma=0.8$.

In [6]:
import numpy as np
import matplotlib.pyplot as plt

fig,ax=plt.subplots()
x=np.random.normal(10,0.8,1000)
ax.hist(x,10)

fig.set_size_inches(4,4)

plt.show()

Library Plotly¶

Index ▲

Just as we saw with the Matplotlib library, with Plotly the development of graphics is also possible, however it is possible to develop a greater number of graphics than with matplotlib, being able to develop statistical, financial, geographical, scientific and 3-dimensional graphics. It should be noted that despite presenting a greater number of graphics we will only see the main ones, since more advanced graphics require greater code syntax and the use of other libraries in some cases.

Along with the above, Plotly allows the development of interactive graphics that can be implemented in Jupyter notebook like this one, saved in HTML format or even in applications using Dash, Shiny or other frameworks.

Just as with matplotlib we start by looking at the scatter plots, in this case we can also develop a first graph of this type, taking into account that we must load the Plotly library

In [1]:
import plotly.express as px

#The offline submodule is imported to then use plotly in the HTML document
import plotly.offline as pyo
pyo.init_notebook_mode()

fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])
fig.show()

Although visually the scatter plots developed with Matplotlib and Plotly have some minimal differences, perhaps one of their most important aspects is that having a very similar syntax also includes the possibility of interacting with the graph, using tools such as zoom, cropping and even downloading the graph.

Some other graphics that we can develop with the Plotly library are the following:

  • Line charts
  • Bar charts
  • Pie charts
  • Histograms
  • Box plots

Line Charts¶

Just as we saw with the scatter graph we can also develop line graphs where we can visualize 1 or more series if necessary, the syntax for this type of graph is shown below.

In [2]:
import plotly.express as px
import pandas as pd

fig= px.line(
     x = [1, 2, 3, 4],
     y = [1, 2, 3, 4]
)
fig.show()

Bar Charts¶

When we make a bar graph, the categories we want to display are ordered on the X axis and the values or quantities for each of these categories are arranged on the y axis. To develop the graph, let's consider the following table:

Country Medals Quantity
South Korea Gold 25
China Gold 10
Canada Gold 9
South Korea Silver 13
China Silver 15
Canada Silver 12
South Korea Bronze 11
China Bronze 8
Canada Bronze 12
In [3]:
import plotly.express as px

#Here we are calling the table data set, which comes preloaded in the library
long_df = px.data.medals_long()

fig = px.bar(long_df, x="nation", y="count", color="medal", title="Medal podium by country")
fig.show()

Pie Charts¶

Unlike the previous graphs where we have the x and y axes as categories and values. In the pie graph we seek to represent a set of categories with their values.

Below you can see the syntax to build the graph

In [2]:
import plotly.graph_objects as go

labels = ['Oxygen','Hydrogen','Carbon Dioxide','Nitrogen']
values = [4450, 2340, 1124, 670]

fig = go.Figure(data=[go.Pie(labels=labels, values=values)])
fig.show()

Histograms¶

In comparison to the previous graphs, the histogram is a diagram that allows us to represent the accumulated frequencies in a certain sample, just as we saw with the Matplotlib library.

In this case we will also use the numpy library to generate a sample using a normal distribution with parameters $N~(\bar{X}=80,\sigma=110)$

In [1]:
import plotly.express as px
import numpy as np

data = np.random.normal(100, 2, size=500) # replace with your own data source
fig = px.histogram(data, range_x=[90, 110])
fig.show()

Box Plots¶

The box plot, although not homologous to the histogram, since the latter does not show the accumulated frequencies. On the other hand, the box plot shows a condensed visualization of the observations, where you can directly see some relevant statistics such as the $Q_{ 1}$, $Q_{2}$ or $Median$, $Q_{3}$, $Maximum$ and $Minimum$ in addition to the outliers.

In this case, the syntax is shown to represent a sample generated with a uniform distribution $U~(a=50,b=100)$ and in addition to the box plot, the points=all parameter is enabled in the function to Also graph all observations.

In [3]:
import plotly.express as px
import pandas as pd

data = np.random.uniform(50,100, size=500)
fig = px.box(data,points="all")

fig.show()

Library Scikit-Learn 📀¶

Index ▲

The Scikit-Learn library is a module developed in Python for the design and implementation of Machine Learning (ML) models. It should be noted that today ML ranges from classic inferential models to the most advanced... However, in this course we will only address some of the inference models.

Although there are various libraries for the implementation of ML models such as: TensorFlow, Keras, Pytorch, Scikit-Learn is one of the most popular libraries used for machine learning models, due to the wide variety of regression, classification algorithms. ,analysis between groups, among others.

Despite the different models offered by the library, we will only see in an introductory way 2 of the classic algorithms, corresponding to:

  • Linear regression
  • Logistic regression

Linear regression¶

Now what we came to... in inferential statistics, one of the classic models is linear regression, which is a mathematical model that allows creating an approximate relationship between different independent variables $X_{i}$ and a dependent variable $Y$, we say that this relationship is approximate because we also consider an error term $\epsilon$ within the mathematical model. In this way we can express the linear regression model as follows shape:

$Y=\beta_{0}+\sum_{i=1}^{n}\beta_{i}X_{i}+\epsilon=\beta_{0}+\beta_{1}X_{1}+...+\beta_{m}X_{m}+\epsilon$

It should be noted that here we are generalizing the linear regression model for $n$ variables, therefore only when the number of independent variables is $X_{i} \leq 2$ It is possible to graph the linear function, however when $2 \leq X_{i}$ we are already talking about a hyperplane, which is not possible to graph but we can perform any type of operation and/or implementation of the linear model.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Loading the iris data set
iris = datasets.load_iris()
iris_df=pd.DataFrame(iris.data)

# Names are assigned for each column of the dataset
iris_df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid']

# The variable "sepal_len" is selected as the independent variable and "petal_len" as the dependent variable
data_x = iris_df.iloc[:,[0]]
data_y = iris_df.iloc[:,[2]]

# A linear regression model is instantiated
regr = linear_model.LinearRegression()


# The linear regression model is fitted with the variables x and y
regr.fit(data_x, data_y)

y_pred = regr.predict(data_x)

# Model coefficient: $B0 and B1$
print("Coefficients B0 and B1: \n", regr.intercept_,regr.coef_)
# Mean square error
print("Mean Squared Error: %.2f" % mean_squared_error(data_y, y_pred))
# Determination coefficient of the linear model:
print("Coefficient of determination: %.2f" % r2_score(data_y, y_pred))
Coefficients B0 and B1: 
 [-7.10144337] [[1.85843298]]
Mean Squared Error: 0.74
Coefficient of determination: 0.76
In [4]:
# Data dispersion and linear model
plt.scatter(data_x, data_y, color="blue")
plt.plot(data_x, y_pred, color="red", linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

Regresión Logistica¶

A diferencia del modelo de regresion lineal,el modelo de regresion logistica es utilizado para predecir variables categoricas,en función de una o mas variables independientes.Si bien este modelo asi como el de regresión lineal,se enmarcan dentro de la categoria de Modelos Lineales Generalizados y de sus distintas configuraciones que podemos definir,solo nos centraremos en la implementación de un modelo del siguiente tipo:

$Y=\frac{1}{(1+e^-{\beta_{0}+\beta_{1} X_{1}+...+\beta_{n} X_{n}})}$

Al igual que para el modelo de regresion lineal,aqui estamos generalizando en modelo de clasificación con $X_{n}$ variables independientes.Sin embargo el modelo que implementaremos a continuación solo considera 4 variables independientes y 3 categorias en la variable dependiente $Y$.

In [66]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

import warnings

# Se ignoran las advertencias
warnings.filterwarnings("ignore")

# Se carga set de datos iris
data = datasets.load_iris()
x = data.data
y = data.target

# Se crea una instancia de un modelo de regresión logística
modelo = LogisticRegression()

# Se ajusta el modelo a los datos
modelo.fit(x, y)

# Se realizan las predicciones
y_pred = modelo.predict(x)
In [68]:
# Calcular la precisión del modelo
accuracy = accuracy_score(y, y_pred)
print("Precisión del modelo:", accuracy)

# Mostrar la matriz de confusión
confusion = confusion_matrix(y, y_pred)
print("Matriz de confusión:")
print(confusion)

# Mostrar un informe de clasificación
report = classification_report(y, y_pred)
print("Informe de clasificación:")
print(report)
Precisión del modelo: 0.9733333333333334
Matriz de confusión:
[[50  0  0]
 [ 0 47  3]
 [ 0  1 49]]
Informe de clasificación:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       0.98      0.94      0.96        50
           2       0.94      0.98      0.96        50

    accuracy                           0.97       150
   macro avg       0.97      0.97      0.97       150
weighted avg       0.97      0.97      0.97       150

Y llegamos al final de este curso introductorio con Python para Data Science.Este curso presenta una guía práctica de como empezar a trabajar con Python desde un punto de vista funcional,hay algunos conceptos que no se consideran en este curso pero que si se abordaran en cursos posteriores en donde se profundizara mucho más en temas como Programación Orientada a objetos,Algoritmos,Desarrollo de visualizaciones e incluso otros Modelos de Machine Learning.

Ante cualquier duda o comentario me puedes escribir a jvenegasg@docente.uss.cl o j.venegasgutierrez@gmail.com

Saludos 🧭